Evaluating the validity of clustering results based on density criteria and multi-representatives

نویسندگان

  • Maria Halkidi
  • Michalis Vazirgiannis
چکیده

Although the goal of clustering is intuitively compelling and its notion arises in many fields, it has been difficult to define a unified approach to address the clustering problem and thus diverse clustering approaches abound in the research community. These approaches are based on different clustering principles and assumptions and they often lead to qualitatively different results. As a consequence the results of clustering algorithms (i.e. data set partitionings) need to be evaluated as regards their validity based on widely accepted criteria. In this paper a cluster validity index, CDbw, is introduced which assesses compactness and separation of the partitions generated by a clustering algorithm. The cluster validity index, given a data set and a set of clustering algorithms, enables: i) the selection of the input parameter values that lead an algorithm to the best possible partitioning of the data set, and ii) the selection of the algorithm that provides the optimal partitioning of the data set. CDbw handles efficiently arbitrarily shaped clusters by representing each cluster with a number of points rather than by a single representative point. The properties of the validity index are theoretically justified. A full implementation and experimental results confirm the reliability of the validity index showing also that its performance compares favorably to that of several others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors

Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...

متن کامل

A density-based cluster validity approach using multi-representatives

Although the goal of clustering is intuitively compelling and its notion arises in many fields, it is difficult to define a unified approach to address the clustering problem and thus diverse clustering algorithms abound in the research community. These algorithms, under different clustering assumptions, often lead to qualitatively different results. As a consequence the results of clustering a...

متن کامل

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004